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1. Introduction 

Quantum computation has attracted interest in recent years because it appears to 
violate the strong form of the Church- Turing thesis; quantum computers seem to be 
fundamentally more powerful than any possible classical computer pQ. In 1994 Peter 
Shor published efficient quantum algorithms for the prime factorization of integers 
and the calculation of discrete logarithms modulo arbitrary primes [2j. Lov Grover's 
1995 introduction of the quantum search algorithm provided a polynomial speedup for 
unstructured searches [3j. As early as 1982 Richard Feynman pointed out the inherent 
difficulties in simulating quantum systems with classical processors and suggested the 
possibility that the use of quantum information processing could produce exponential 
speedups in such simulations [I]. Subsequently efficient quantum algorithms for 
performing simulations of physical systems were developed [SI El El El El HO, ITT] , 
vindicating Feynman's prediction and further motivating theoretical and experimental 
work towards realizing quantum computation. 

In this paper we focus on the quantum circuit model of quantum computation [T2] . 
In this setting a quantum computation is a unitary transformation applied to n ideal 
qubits (we ignore decoherence throughout). Given the irrelevance of global phases the 
set of all such transformations is the special unitary group SU(2 n ). To represent an 
element of SU(2 n ) by a circuit we must specify a fixed set of elementary gates which 
act on a fixed number of qubits. A typical choice is the controlled-NOT (CNOT) and 
arbitrary one-qubit gates. The length of a circuit is the number of elementary gates 
which it contains, however, because of the relative difficulty of multi-qubit operations 
we shall only consider the number of CNOT gates in a circuit. There are several means 
of physically implementing a quantum computation [131 HH US, EHl U71 HE] . One qubit 
local operations and a few two qubit operations, such as the controlled-NOT (CNOT) 
gate have been experimentally implemented [19], EHl EH [221 E31 EI] ■ 

The set of all allowed transformations for a quantum computer form the group 
SU{2 n ) and a generic element of SU{2 n ) requires a circuit of length C(4 n ) gates. 
Specific transformations corresponding to efficient quantum algorithms are of particular 
interest. A quantum algorithm specifies a circuit family, with a circuit defined for each 
value of n. For a quantum algorithm to be efficient each of these circuits must be 
composed of a number of operations bounded above by a polynomial in n. Each of 
these operations must involve a subset of the n qubits with size bounded above by a 
polynomial in the logarithm of n. Some algorithms, for example the quantum Fourier 
transform, naturally decompose into elementary gates acting on qubits [25]. In other 
cases, for example generic quantum Fourier transforms of functions on groups other 
than Z/(2 n Z) [261 EZ], and in application of phase estimation to problems of quantum 
simulation [10l [8], bounded size operations arise which do not naturally factor into 
elementary gates. Before such quantum algorithms may be implemented experimentally 
one is therefore faced with a problem of quantum compilation - given a set of unitary 
operators of fixed size and an elementary gate set, constructively produce the quantum 
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circuit realizing the operators. 

It was shown by construction in 1995 that the set of one qubit operations and the 
CNOT are universal: any unitary operation on any number of qubits can be realized as 
a circuit over these gates. However, the number of CNOT gates required for n qubits 
was of order n 3 4 n [28]. Since 1995 a number of advances have been made towards the 
CNOT optimization of universal quantum circuits. We divide these into three categories: 
circuit optimization, Lie algebra decompositions, and explicit algorithms. 

Knill proved that the asymptotic CNOT cost of universal quantum circuits could 
be reduced by a factor of n 2 to 0(n4 n ) [29]. In 2004 Shende, Markov and Bullock proved 
the highest known lower bound on asymptotic CNOT cost, [|(4 ra — 3n — 1)] [31)], and 
Vartiainen, Mottonen, and Salomaa simplified the best existing circuit using Gray codes 
to achieve for the first time a leading order CNOT cost of 0(4 n ) (in fact, for large n, the 
cost was approximately 8.7 x 4 n ), a multiplicative factor away from the highest known 
lower bound [31]. Later that year, the same authors, along with Bergholm, presented 
a decomposition based on the cosine-sine matrix decomposition (CSD) which produced 
asymptotic behavior scaling as 4™ — 2 n+1 [32]. Vatan and Williams published a three 
CNOT universal two qubit gate along with a proof that fewer CNOTs could never 
achieve universality [33], and proposed a 40 CNOT universal three qubit gate which 
was, at the time, the best known [34J. The current best known circuit decomposition 
applicable to systems of more than two qubits was introduced by Shende, Bullock 
and Markov. Using intuition drawn from the Shannon decomposition of classical logic 
circuit design, along with the application of some circuit identities, Shende, Bullock and 
Markov have designed a universal circuit requiring 20 CNOTs in the three qubit case 
and ||4 n — |2 n + | CNOTs asymptotically [35]. This decomposition is known as the 
Quantum Shannon Decomposition (QSD), by analogy with the Shannon decomposition 
of classical circuit design, and brings the upper bound on asymptotic CNOT cost to 
within a factor of two of the highest known lower bound while halving the cost of 
implementing a general three qubit gate to 20 CNOTs. 

The second area of research is the exploration of the various ways of decomposing 
the Lie algebra of the special unitary group. Essentially all of the work in this 
area has made use of the Cartan decomposition. In the first part of the twentieth 
century Cartan proved that (up to conjugacy) there exist only three types of Cartan 
decomposition on the unitary lie algebra, AI-III [36l 137] . The CNOT optimal two 
qubit circuit of Vatan and Williams [33] is, as described in detail below, based on a 
type AI Cartan decomposition. Khaneja and Glaser proposed a scheme based on a 
Cartan decomposition of su(2 n ) (now known as the Khaneja Glaser Decomposition, 
or KGD) which lends itself to efficient recursive circuit decompositions [38], and, 
working with Brockett, they showed that this scheme was time optimal for NMR based 
implementations of quantum computation [39]. Bullock identified the Khaneja Glaser 
Decomposition, as well as the CSD, as type AIII Cartan decompositions and thereby 
established an equivalence between the two [40J. The KGD was used by Vatan and 
Williams to produce their efficient two and three qubit circuits [MJ [33]. Bullock and 
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Brennen and more recently Dagli, D'Alessandro and Smith have used type AI and 
All decompositions, including the Concurrence Canonical Decomposition (CCD) and 
the Odd- Even Decomposition (OED), to study entanglement dynamics in quantum 
circuits [311 132]. 

In order to make practical use of a CNOT optimized quantum circuit or a novel 
Lie algebra decomposition it is necessary to have an algorithm which can extract the 
parameters which appear in the decomposition from an arbitrary unitary operation. 
Sousa and Ramos provided an algorithm based on the generalized singular value 
decomposition for computing the parameters in a CNOT optimized two qubit circuit (the 
parameters for Vatan and Williams circuit can be extracted from their algorithm with 
a little algebra, and other equivalent circuits can be computed with a similar amount 
effort) [43]. Just as Vatan and Williams' work on small numbers of qubits does not 
generalize to larger operators, however, Sousa and Ramos' algorithm does not generalize 
beyond two qubits. Earp and Pachos provided a constructive algorithm to perform a 
type AIII Cartan decomposition of an arbitrary n qubit operator (they use the Khaneja 
Glaser Decomposition specifically, but their algorithm can be modified to implement 
other forms of the AIII decomposition) [H]. Earp and Pachos' algorithm relies on 
numerical optimization and a truncation of the Baker-Campbell-Hausdorff formula. 
Nakajima, Kawano and Sekigawa published the first algorithm to compute Cartan 
decompositions of the unitary group making explicit use of Cartan involutions [45J; 
their algorithm computes parameters for circuits composed of uniformly controlled 
operations, similar to the circuits produced by CSD based schemes. Their algorithm 
requires 4" — 2 n_1 CNOT gates asymptotically. In the three-qubit case this number 
can be reduced by taking advantage of the known CNOT-optimized two qubit circuit 
developed by Vatan and Williams to produce a 44 CNOT universal three qubit circuit 
(see Fig. [2]). Since a lower bound of |(4 n — 3n — 1) has been proven on the asymptotic 
CNOT cost of arbitrary n-qubit operations with a lower bound of 14 CNOTs in the 
three qubit case [30], this efficiency cannot be improved by more than a factor of 
four. Circuits produced by Nakajima, Kawano and Sekigawa's algorithm are a factor of 
two longer than circuits obtained from the Quantum Shannon Decomposition (QSD). 
However, the QSD lacks a constructive Lie algebra based factoring algorithm in the 
published literature so far. It is to this issue we turn in the remainder of the paper. 

We first give some mathematical background introducing important definitions 
and theorems which will be used later in the work. We then discuss the important 
special cases of one and two qubit operations, and provide Cartan involution based 
algorithms for extracting parameters for CNOT optimal quantum circuits from arbitrary 
one and two qubit unitary operations. We then place the QSD, the best known circuit 
decomposition in terms of CNOT cost, into a Lie algebraic context by showing it to be 
an alternating series of Cartan decompositions. We define the Cartan involutions which 
correspond to these decompositions, and we show that these involutions can be used 
recursively to obtain the QSD for unitary operators on any number of qubits. 
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2. Mathematical Background 

In the interest of making our presentation more self-contained, we briefly review some 
basic definitions which will be important throughout this work. For a fuller presentation 
we refer the reader to |46l H7j. Throughout, we use [ab] to denote the Lie bracket in 
general, and the notation [a, b] to denote the Lie bracket for matrix algebras where it is 
the commutator [a, b] = ab — ba. 

Definition 1: If a subalgebra / of a Lie algebra g satisfies the condition that 
[xy] G / for all x G g, y G I then / is called an ideal in g. 

Example 1: Clearly and g are trivial ideals of g. An important example of an 
ideal is the derived algebra of g, denoted [qq], which consists of all linear combinations 
of brackets [xy], with x, y G g. 

Definition 2: A non-abelian Lie algebra it (i.e. [Kit] 7^ 0) in which the only ideals 
are and all of il is called simple. Observe that since the derived algebra is an ideal, for 
any simple Lie algebra S the derived algebra is equal to the entire algebra: [SS] = S. 

We may define a sequence of ideals, the derived series of an algebra A, as follows: 



If = for some n we call A solvable. Observe that all abelian Lie algebras are 
solvable, while all simple Lie algebras are nonsolvable. We shall simply state the fact 
that every Lie algebra contains a unique maximal solvable ideal (maximal in the sense 
that it is contained in no larger solvable ideal), which is referred to as the radical of 
the algebra. If L is a non-zero Lie algebra and Rad L = 0, we call L semi-simple. This 
condition for the semi-simplicity of a Lie algebra is equivalent to the condition that the 
algebra is the direct sum of simple Lie algebras. Most of the Lie algebras which occur 
in physics are semi-simple, and there exists a very rich and well developed structure 
theory of semi-simple Lie algebras which we shall exploit throughout the remainder of 
this work. The essential structure theorem which lies behind both the CSD, the KGD, 
and as we shall show later the QSD, is the Cartan decomposition. 

Definition 3: A Cartan Decomposition of a real semi-simple Lie algebra g is a 
decomposition g = m © 6 where m = t^, for which t and m satisfy the commutation 
relations: 



A<°> = A, v4 (1) = [AA], A^ = [AWaW], A« = [A^A^~% ... 



% i] c t 



(1) 



[m, t] = m 



(2) 



[m, m] C t 



(3) 



A few further features of the Cartan decomposition are essential. 
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Definition 4 : Consider a semi-simple Lie algebra with Cartan decomposition 
= m © t and a subalgebra h of contained in m. Because [m, m] C t, J) must 
be Abelian. We refer to a maximal Abelian subalgebra contained in m as a Cartan 
subalgebra of and t. 

Definition 5: The Lie group G acts on its Lie algebra g through a conjugation, 
known as the adjoint action, Ado '■ — > defined by 

AduX = U^XU (4) 

for u G G and X G 0, and for A" = exp(t) we define the Adjoint orbit of X to be 

Ad*X = (J AdfcX (5) 

Any two Cartan subalgebras f) and are related to one another through the adjoint 
action of the group G on its Lie algebra 0. With these definitions, we now state 

Theorem 1: For any two maximal Abelian subalgebras f) and h' in m there is an 
element k G K such that Adk(l)) = f)'. Furthermore, the adjoint orbit of f) is equal to 
m, i.e. 

m = |J Ad k i) (6) 

keK 

Finally, we come to the key definition in this paper: 

Definition 6: Given a semisimple Lie algebra with Cartan decomposition 
= m © t and a Cartan subalgebra h, let A = exp(h) and K = exp(t), then G = KAK 
is called a (global) Cartan decomposition of the semi-simple Lie group G. 

The theorem which establishes the existence of such a decomposition for any semi- 
simple Lie group is proved in [171 HHJ HE]- The G = KAK structure has been used 
widely in work on quantum circuit decompositions in the past, most notably in Khaneja 
and Glaser's work, as well as in CSD based circuit designs (as explained by Bullock |40j) 
and in subsequent work based on these decompositions (cf. e.g. [381 E21 EH H2])- The 
task of computing the Cartan factors for a specific unitary matrix is greatly facilitated 
by the existence of Cartan involutions. 

Definition 7: A Cartan involution, denoted 9, is a non-identity automorphism on 
a Lie algebra u such that 6 2 is the identity, and the global Cartan involution has the 
equivalent action on U — exp(u) with the property that 

0(9) = 1 9 9 G 1 , 8(G) = [% ^ 6XI ?\ (7) 
v ' 1 -g g em K ' 1 G f G G exp(m) v ' 

In the case of su(n) there are only three classes of Cartan decomposition, denoted 
AI, All, and AIII. The t subalgebras of su(n) are isomorphic to so(n), sp(|), and 
s[u(p) © u(q)] for any p + q = n for AI, All, and AIII decompositions, respectively 
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(All only exists for unitary groups acting on an even number of dimensions, a common 
situation in quantum information where the state-spaces of n-qubit registers have 
dimension 2 n ) [42]. In this work we are particularly concerned with decompositions 
of type AI and AIII because in certain important cases there are straightforward 
and efficient means of physically implementing real orthogonal or direct sum unitary 
operators. Since we are concerned in this work only with the unitary group, whose 
elements satisfy the condition U' 1 = U', we may exploit the Cartan involution to 
factor matrices. 

Theorem 2: For any G G SU(2 n ) with Cartan decomposition G = KM, K e 
exp(E), M G exp(m), M 2 is uniquely determined by M 2 = 6(G t )G. 

Proof: Q(G^)G = Q(M^K^)KM = Q(M^)Q(K^)KM = M K l\ M = M 2 . □ 

A KAK type decomposition of the special unitary group is desirable because 
there is a considerable amount of freedom in selecting the t subalgebra and a Cartan 
subalgebra f), and with appropriate selection of t and () the factors returned for an 
arbitrary unitary operator are of a form which may readily be translated into physically 
realizable quantum gate sequences. Indeed, the Khaneja-Glaser Decomposition has 
been shown to be time optimal for NMR quantum computing, as compared to other 
published decompositions [39] . The existence of this decomposition is of no practical 
use, however, without an algorithm for explicitly calculating the factors Ki, K2 and A 
for a given specific unitary matrix. 

Notation When discussing the generators of the Lie algebras of multi-qubit 
operator groups we will use a streamlined notation. We define ZI = cr z ®l, IX = l®o~ x , 
ZY = a z ® o~ y and so on, where a x , o y and o z are the familiar Pauli spin matrices 

a * = (i 0)' a » = (< 7 ) '°" 2= (d 

Additionally, we define to be a Pauli-x (likewise y and z) acting on the n th qubit, 
i.e. Z^> = IIZ. 

3. Special Cases: One and Two Qubits 

3.1. One qubit factoring: Euler Angle decomposition of SU(2) as a Cartan 
Decomposition 

We now provide a simple, illustrative example of a Cartan decomposition and an 
involution based algorithm for converting an arbitrary one qubit unitary operator into 
a Cartan inspired circuit. This is the simplest possible case of a Cartan decomposition 
of a unitary group, however, the factoring of multi-qubit gates inevitably reduces in the 
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end to a series of one-qubit gates which must themselves be decomposed. The structure 
of the algorithm for this simple example is identical to the more involved cases to follow. 

Definition: The Lie algebra su(2) is generated by the Pauli spin matrices. The 
decomposition su(2) = t © m where t = span R i{F} and m = span R z{X, Z} satisfies the 
criteria to be a Cartan decomposition. Furthermore, either span R i{X} or span R i{Z} 
is a maximally abelian subalgebra of su(2) contained in m. Thus the decomposition of 
SU (2) given by U = e lAY e lBZ e lCY is a Cartan decomposition. Using the fact that SU(2) 
is the double cover of SO (3), we recognize this Cartan decomposition as the Euler angle 
decomposition of three dimensional rotations. We now explicitly calculate the Euler 
angle decomposition of an arbitrary single qubit unitary using a Cartan involution. 

The Cartan involution corresponding to our chosen Cartan decomposition (t = 
span R z{F}, m = span R i{X, Z} and f) = span R i{Z}) is 9(u) = YuY, Q(U) = YUY. We 
compute the Cartan KAK decomposition of an arbitrary G G SU(2) as follows 



1. We exploit Theorem 2 to calculate M 2 = YG^YG 



2. Diagonalize M 2 = PDPK Note that as a diagonal element of SU(2), D must be of 
the form e iaZ , i.e. D G exp(f)), and, furthermore, Theorem 1 implies that P G exp(6). 

3. We now have M = PD 1 ' 2 P^ and we may find K = GMl 

4- This constitutes a complete decomposition of G into the form e Y e lBZ e 1 : G = 
KPD l l 2 P\ and it is trivial to extract the angles A, B and C from the matrix forms of 
these operators. 



3.2. Two qubit factoring from a Cartan decomposition. 



The task of factoring two qubit operators is facilitated by several unique properties of 
SU (4). Firstly, 5*0(4) is the Lie group corresponding to the t subalgebra of su(4) under 
a type AI involution. S'0(4) and the group of local operations acting on two qubits 
separately, SU{2) ® SU(2), share a simply connected covering group, Spin(4). In fact, 
elements of SO (4) are mapped uniquely onto elements of SU{2) ® SU{2) by changing 
to the "magic basis" of Bell states through conjugation by the matrix |33j : 



B = ^ 





/ 1 


% 








\ 


1 
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1 




7! 








i 


-1 






V 1 


— i 








/ 



(8) 



There is no equivalent connection between S0(2 n ) and SU{2 n ^ 1 ) ®SU(2 n ~ l ) for n > 2. 
As a result of this close connection between the type AI Cartan decomposition of SU (4) 
and the group of local operations (which may be implemented without the use of CNOT 
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gates) it is possible to construct a universal 2 qubit circuit requiring only 3 CNOT gates 
in the worst case (see Figure [TJ [33| H5| I33| 150] . 

Definition: The involution for type AI Cartan decompositions of su(N) is given 

by 

6{u) = -u T for u G su(iV), Q(U) = (£/ _1 ) T = U* for U G SU(N). (9) 

The involution given by fixes a fi-subalgebra corresponding to so (4), t = 
span M z{/Y, XY, ZY, YI, YX, YX}, and the diagonal elements of m, i.e. I) = 
spa.ia R i{IZ, ZI, ZZ} constitute a Cartan subalgebra. Furthermore, as discussed in the 
introduction, a transformation to the basis of Bell states (the "magic basis") maps 
this 6 subalgebra onto su(2) © su(2) and also maps the maximal abelian subalgebra of 
diagonal matrices onto the subalgebra chosen by both Khaneja and Glaser and Vatan 
and Williams, f)' = span K i{ W, YY, ZZ}. As a result, we may use the Cartan involution 
of Equation and matrix diagonalization to compute the parameters necessary for 
Vatan and Williams two-qubit CNOT optimal circuit. 

The parameters for an arbitrary two qubit unitary U may be calculated as follows: 

1. We define a new operator U' = B^UB where B is defined in Equation [HJ 

2. Compute M 2 = <d(U' ] )U' = (U' r )*U' = U' T U\ which is in the exponentiation of m. 

3. Diagonalize: M 2 = PDP^ where D e exp(fj) and P £ 50(4). 

4. Find L>5 and hence K' = U'PD^P^. 

5. K'P and P 1 " are both elements of SO (A), so K x = BK'PB^ and K 2 = BP^B^ e 
SU(2) ® SU(2) and A = BD*B* G exp(f)'). Hence 

K X AK 2 = BK'PB ] BD^B ] BP ] B ] = BK'PD^P ] B ] = BU' B ] = U (10) 

is a Cartan decomposition of U of the type used by Vatan and Williams. 

6. Simple algebraic manipulations of yield the parameters a, /3and7 which appear 
in the center portion of the circuit in Figure 6 of |33j and the partial trace may be used 
to separate K\ and K 2 into the local operations of which they are composed, which may 
then be decomposed as described in the previous section. 

4. The QSD from Cartan Involutions 

In this Section we give a Cartan decomposition and constructive algorithm for obtaining 
the QSD (recall that it is not possible to exceed the QSD's efficiency by even a 
factor of two for any number of qubits). This algorithm is constructive and produces 
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Figure 1. The CNOT optimized universal two qubit circuit; Ua, Ub, Va, and Vb may 
be decomposed into 3 single qubit rotations each by the Euler angle decomposition 
given above, and Va and Ub may absorb the z-rotations preceding and following them 
respectively yielding a circuit consisting of 3 CNOT gates and 15 single qubit rotations. 



circuits which are less than half as long as the constructive algorithms of Nakajima, 
Kawano and Sekigawa [HI H5] . The principle difference between those algorithms and 
the QSD is that they proceed by reducing an n-qubit circuit to a circuit involving 
uniformly controlled n — 1 qubit gates. These uniformly controlled gates are then 
reduced to controlled and uncontrolled n — 1 qubit gates. The uncontrolled n — 1 qubit 
gates, and the controlled n — 1 qubit gates are then factored again using the Cartan 
decomposition. However, all gates obtained by this decomposition must be controlled, 
leading to a doubling of the number of CNOTs over the best known decompositions. 
This problem arises because only part of the decomposition is handled at the Lie algebra 
level - after the first decomposition circuit identities are introduced before the Cartan 
decomposition is applied again. In what follows we take the Lie algebraic point of 
view throughout: the uniformly controlled operations are treated as a Lie-subgroup, 
and a Cartan decomposition of the corresponding Lie-subalgebra is obtained. This 
Cartan decomposition results in uncontrolled n—1 qubit operations which remain to be 
factored, and so the first part of the algorithm of Nakajima, Kawano and Sekigawa can be 
applied again. The resulting algorithm is an alternating pair of Cartan decompositions, 
each of which has a simple Cartan involution which enables the factors to be obtained 
explicitly. Inspection of the resulting procedure reveals precisely the QSD of [35] and so 
this algorithm gives a Cartan decomposition based derivation of the QSD and a Cartan 
involution based explicit algorithm for obtaining the QSD. 

Because every other step in our recursive procedure is identical to the first step 
of Nakajima, Kawano and Sekigawa's algorithm, we first define the correponding 
components 6 and m of the Cartan decomposition of SU(2 n ), and the Cartan subalgebra 
f). The 6-subalgebra is of type AIII: the direct sum of two lower dimensional unitary 
Lie algebras t = s[u(p) © u(q)] where p + q = 2 n . 

Definition: For the n-qubit case the decomposition is defined by: 

I = span R {A ® Z, B ® 1, iZ {n) \A, B e su(2 n " 1 )} (11) 
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m = spa,n R {A®X 1 B ®Y,iX {n \iY in) \A,B esu(2 n - 1 )} (12) 

Definition: The Cartan involution is: 

6{u) = Z^ n) uZ {n \ G(U) = Z {n) UZ {n) (13) 

Hence we may compute the global Cartan decomposition G = KM of SU(2 n ) as in 
Theorem 2. 

We must now define a Cartan subalgebra f) contained in m. Here Nakajima et al. 
make a different choice of f) to that used by Khaneja and Glaser in [38] and [39] . Recall 
that all maximal Abelian subalgebras share an adjoint orbit, namely m itself, and that 
one may, as a result, switch between them with relative ease. 

Definition: Nakajima, Kawano and Sekigawa choose to define 

f) = span R {|j)(j| ® ia x \j = 0, 2"- 1 - 1} (14) 

The algorithm of [45] based upon this choice of f) corresponds to a decomposition of 
an n-qubit quantum logic circuit into 2 n ~ 1 — 1 uniformly controlled one qubit elementary 
rotations, requiring 4 n — 2 n_1 CNOT gates. 

Note that Nakajima, Kawano and Sekigawa Cartan decompose SU(2 n ) yielding 2 
elements of SU(2 n ~ 1 ) © SU(2 n ^ 1 ). These are then implicitly treated as if they were 4 
elements of SU(2 n ~ 1 ) with no further special structure, and precisely the same Cartan 
decomposition is applied to each of these smaller unitary operators. This approach is 
implicitly based on the assumption that the tensor sum of Cartan decompositions is the 
Cartan decomposition of tensor sums. This assumption, however, can easily be proven 
to be false. Thus, we now set out to find a Cartan decomposition of the Lie algebra 
0[u(2 n ~ 1 ) ©u(2 n - x )]. 

Consider the basis of s[u(2"- 1 ) © u(2™- 1 )]: span R {A © Z, B © l,iZ^\A,B e 
su(2"- 1 )}. 

Definition: It is straightforward to confirm that the decomposition 

=span K {A©l,zZ (ri) |Aesu(2"- 1 )} 

m' = span M {A © Z\A G su(2 n_1 )} 

satisfies the definition of a Cartan decomposition for s[u(2 n_1 ) © u(2 n_1 )]. Notice that 
represents a phase and commutes with every element of indeed it commutes 
with every element of s[u(2 n_1 ) ©u(2 n_1 )]. We may factor out the Z^ n ' component from 
s[u(2 n - 1 ) © \x(2 n - 1 )} to jet su(2 n - 1 ) © su(2"- 1 ). If we define V = f \ span M Z("), then 
su(2™ _1 ) ©su(2 n_1 ) = {?' © m' is a Cartan decomposition. 

Definition: A Cartan involution to separate these subsets is 6{m) = X^mX^. 
Furthermore we find that if we apply this involution to an element of s[u(2 ra_1 )©u(2 rt_1 )] 
which has not had its Z^ phase factored out, the phase lands in the —1 eigenspace. 
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Figure 2. A simplified three qubit circuit based on Nakajima, Kawano and 
Sekigawa's algorithm: uniformly controlled two qubit operations are built using 
Vatan and Williams' optimal two qubit circuit to produce a universal 44 CNOT 
three qubit circuit with a constructive algorithm. The operator represented here is 
(f/i®|0) (0\+V 1 ®\l){l\)(R xl ®R x2 ®R X 3®R X 4)(U2®\0){0\+V 2 ®\l){l\),m accordance 
with the NKS algorithm. 



We must also choose a Cartan subalgebra in m'; for simplicity, we choose the set of 
diagonal elements of m': f)' = sp&n R i{IZZ, ZIZ, ZZZ} in the three qubit case. 

We now compute the Cartan KAK factors of an arbitrary element (G) of 
S[U (2 n_1 )©C/(2 n_1 )]. First we use the method of Theorem 2 to compute the component 
of G not in exp(fi'), i.e. we compute M 2 = M 2 P 2 where M is from G = KM and P is the 
Z^ factor. Next we diagonalize M 2 - this diagonal matrix is A 2 P 2 , where M = LAL^ 
for A G exp(f)'), L G exp(t'). Finally we take the square root of this diagonal matrix 
and compute K. To be completely explicit, we present here the algorithm. 
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1. Compute M 2 = M 2 P 2 = Q(G^)G where Q(U) = X^UX^ (see Theorem 2). 

2. Compute the eigenvalue decomposition of M 2 : let M 2 = LD 2 L^ be the eigenvalue 
decomposition. Since D 2 is diagonal and unitary it must be an element of the 
exponentiation of ()' U span R Z^ n ^ and L must be an element of exp(t'). 

3. Compute A = D 1 ^ 2 = AP where A e exp(fj') and P is the phase term. Each entry 
in the diagonal unitary D is of the form e ie , so we may simply replace each of these 
entries with and we have A. Now M = LAL^ . 

4- Compute K = GM^ . We have G = P(KLAL^) where P commutes with all of the 
other factors and therefore may be placed according to convenience, K,L e exp(t') and 
A G exp(f)'), that is K and L are general (n — 1) qubit operations which leave the low 
qubit fixed and A is a uniformly controlled z- rotation on the low qubit. 

The operations in exp(t') do nothing to the n th qubit and can perform any unitary 
operation on the remaining n — 1 qubits, i.e. we can treat them precisely as we would 
any element of SU(2 n ~ 1 ), and we may absorb the diagonal P into A and implement 
A = AP according to the decomposition offered in [35] , which leaves us with a uniformly 
controlled z-rotation on the low qubit and a diagonal operator acting on the remaining 
qubits which may simply be absorb into a neighboring n — 1 qubit operation. 

Given an operation on any number of qubits n, we apply Nakajima, Kawano and 
Sekigawa's algorithm to produce 2 elements of S[U(2 n ~ 1 ) © U(2 n ~ 1 )], then we apply 
the algorithm we have just described to these uniformly controlled operations to yield 4 
elements of SU(2 n ~ l ) to which we apply the NKS algorithm, and so on, until we are left 
with 4 n_2 two qubit operations, to which we apply the AI algorithm described earlier. 
This recursive decomposition scheme generates a complete constructive factorization 
(see Figure [3] for the three qubit case and Figure H] for an illustration of the recursion 
applied to four qubits). Using no further refinements, this algorithm yields a 24 CNOT 
three qubit gate and has an asymptotic CNOT cost of ^4™ — |2™, an improvement of 
nearly a factor of two over the standard NKS circuit. 

5. Conclusions and Future Work 

This scheme of alternating Cartan decompositions of su(2 n ) with Cartan decompositions 
of s[u(2 n_1 ) © u(2 n_1 )] is the best known circuit decomposition paradigm. This chain 
of decompositions yields precisely the QSD circuit structure that Shende, Bullock and 
Markov derived by analogy from the classical Shannon decomposition in [35]. Further 
slight improvements can be made to the CNOT cost of the tensor sum Cartan circuit 
by the application of the identities given in Appendix A and Theorem (14) of [35J, 
reducing the overall cost of a three qubit gate to 20 CNOTs, and the asymptotic cost 
to ||4 n — |2 n + g, but the decomposition is still fundamentally the same, and these 
simplifications can be incorporated into the constructive algorithm presented here with 
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Figure 3. The 24 CNOT universal three qubit quantum circuit derived without 
further simplification from the Cartan decomposition of s[u(2 ra_1 ) © u(2 n_1 )]. 



very little effort. By constructing the QSD from its Lie algebraic roots this work puts 
the QSD - the best known generic quantum circuit decomposition, less than a factor 
of two from the highest lower bound - into its proper Lie algebraic context as a series 
of Cartan decompositions, and provides a new Cartan involution based algorithm to 
implement the QSD explicitly. 

Another significant advantage of this sort of decomposition, especially in light of 
the fact that historically few-qubit circuit optimization has at times advanced ahead 
of asymptotic circuit optimization (cf. [31]), is that any future improvements to few- 
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Figure 4. A block diagram of the QSD applied to a four qubit operation; notice that 
it consists of only 3 thrice controlled rotations on the low qubit and 4 general three 
qubit QSD circuits on the higher qubits. 



qubit efficiency can simply be plugged into this algorithm at its lowest level of recursion 

(where we turn to Vatan and Williams' circuit in this case) and translated instantly 

into improved asymptotic gate counts. For example, one could use existing methods (e. 

g. [51] [52] ) to test whether a particular two-qubit gate has non-generic structure which 

means that it requires one or two CNOT gates rather than three. Substantially shorter 

circuits could be obtained by the application of such methods, and by their extension 

to three qubit circuits. 
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